Information Content of a Phylogenetic Tree in a Data Matrix

نویسندگان

  • Tania Roy
  • Hsieh Fushing
  • Xunde Li
  • Brenda McCowan
  • Rob Atwill
چکیده

Phylogenetic trees in genetics and biology in general are all binary. We make an attempt to answer one fundamental question: Is such binary branching from the coarsest to the finest scales sustained by data? We convert this question into an equivalent one: where is the structural information of tree in a data matrix? Results from this conceptual as well as computing issue afford us to conclude a negative answer: Each branch being split into two at each inter-node of tree from the top to bottom levels is a man-made structure. The data-driven computing paradigm Data Mechanics is employed here to reveal that information of tree is composed of a set of selected temperatures (or scales), each of which has a clustering composition strictly regulated by a temperature-specific cluster-sharing probability matrix. The resultant Data Cloud Geometry (DCG) tree on the space of species is proposed as the authentic structure contained in data. Particularly each core clusters on the finest scale, the bottom level, of DCG tree should not be further partitioned because of uniformity. Beyond the finest scale, the branching of DCG tree is primarily based on probability, which induces an Ultrametric satisfying super triangular inequality property. This Ultrametric property differentiates DCG tree from all popular trees based on Hierarchical clustering (HC) algorithm, which typically employs an empirical, often ad hoc distance measure. Since this measure is regulated by the triangular inequality, it is not capable of producing a “flat” branch, in which all its members (more than two) have equal distances to each others. We demonstrate such information content on an illustrative zoo data first, and then on two genomic data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A comparative phylogenetic analysis of Theileria spp. by using two two "18S ribosomal RNA" and "Theileria annulata merozoite surface antigen" gene sequences

More than 185 species, strains and unclassified Theileria parasites are categorized in the Entrez Taxonomy. The accurate diagnosis and proper identification of the causative agents are important for understanding the epidemiology, prevention and appropriate treatment. This study aims to discuss the importance of two genes of Theileria annulata 18S ribosomal RNA (18S rRNA) and Theileria annulata...

متن کامل

Analysis of genetic diversity between and within Iranian accessions of spinach (Spinacia oleraceae) by SRAP markers

Spinach (Spinacia oleracea L.) is an economically important leafy vegetable crop in many countries. This is the first case study of using SRAP markers to analyze genetic diversity of Iranian spinach accessions. Eight SRAP primer combinations generated 88 scorable bands ranging from 50 to 1000 bp, among which 73 were polymorphic, with an average of 82.9 polymorphic bands per primer combination a...

متن کامل

RadCon: phylogenetic tree comparison and consensus

SUMMARY RadCon is a Macintosh program for manipulating and analysing phylogenetic trees. The program can determine the Cladistic Information Content of individual trees, the stability of leaves across a set of bootstrap trees, produce the strict basic Reduced Cladistic Consensus profile of a set of trees and convert a set of trees into its matrix representation for supertree construction. AVA...

متن کامل

A preliminary study on phylogenetic relationship between five sturgeon species in the Iranian Coastline of the Caspian Sea

The phylogenetic relationship of five sturgeon species in the South Caspian Sea was investigated using mtDNA molecule. Sequence analysis of mtDNA D-loop region of five sturgeon species [Great sturgeon (Huso huso), Russian sturgeon (Acipenser gueldenstaedtii), Persian sturgeon (Acipenser persicus), Ship sturgeon (Acipenser nudiventris), Stellate sturgeon (Acipenser stellatus)] and DNA sequencing...

متن کامل

Direct Molecular Detection and Phylogenetic Tree Analysis of Gastrointestinal Protozoan Parasites (Giardia lamblia, Entamoeba histolytica, Cryptosporidium parvum) from Diarrhea Infection in Kut City of Iraq: A Short Communication

Background: The intestinal tract of human can be infected by protozoan parasites. In this short communication, the stool samples were collected from patients with diarrhea referred to Kut hospital, Iraq, and then the parasites (Giardia lamblia, Entamoeba histolytica, Cryptosporidium parvum) were considered for molecular identification. Methods: Stool samples were collected from 69 patients wit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018